Efficient text chunking using linear kernel with masked method

نویسندگان

  • Yu-Chieh Wu
  • Chia-Hui Chang
چکیده

In this paper, we proposed an efficient and accurate text chunking system using linear SVM kernel and a new technique called masked method. Previous researches indicated that systems combination or external parsers can enhance the chunking performance. However, the cost of constructing multi-classifiers is even higher than developing a single processor. Moreover, the use of external resources will complicate the original tagging process. To remedy these problems, we employ richer features and propose a masked-based method to solve unknown word problem to enhance system performance. In this way, no external resources or complex heuristics are required for the chunking system. The experiments show that when training with the CoNLL-2000 chunking dataset, our system achieves 94.12 in F(b) rate with linear. Furthermore, our chunker is quite efficient since it adopts a linear kernel SVM. The turn-around tagging time on CoNLL-2000 testing data is less than 50 s which is about 115 times than polynomial kernel SVM. 2006 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arbitrary Phrase Identification using Linear Kernel with Mask Method

In this paper, we proposed an efficient and accurate text chunking system using linear SVM kernel and a new technique called mask method. Previous researches indicated that systems combination or external parsers can highlight the chunking performance. However the cost of constructing multiclassifiers is even higher than developing a single processor. Besides, the use of external resources will...

متن کامل

Large Scale Learning with String Kernels

In applications of bioinformatics and text processing, such as splice site recognition and spam detection, large amounts of training sequences are available and needed to achieve sufficiently high prediction performance on classification or regression tasks. Although kernel-based methods such as SVMs often achieve state-of-the-art results, training and evaluation times may be prohibitively larg...

متن کامل

Chunking for massive nonlinear kernel classification

A chunking procedure [2] utilized in [18] for linear classifiers is proposed here for nonlinear kernel classification of massive datasets. A highly accurate algorithm based on nonlinear support vector machines that utilizes a linear programming formulation [15] is developed here as a completely unconstrained minimization problem [17]. This approach together with chunking leads to a simple and a...

متن کامل

Fast Methods for Kernel-Based Text Analysis

Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss ...

متن کامل

Shan Feng-20151710-TOAUTOCJ-MS

The paper analyses the multiple kernel learning-based face recognition in pattern matching area. Based on the analysis of the basic theory of multiple kernel SVM, this thesis focuses on the multiple kernel SVM algorithm based on semi-infinite linear program (SILP), including SILP based on column generation (CG) and SILP based on chunking algorithm (CA). The two SILP improved algorithms are appl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2007